Analysis and classification of speech mode: whispered through shouted
نویسندگان
چکیده
Variation in vocal effort represents one of the most challenging problems in maintaining speech system performance for coding, speech and speaker recognition. Changes in vocal effort (or mode) result in a fundamental change in speech production which is not simply a change in volume. This is the first study to collectively consider the five speech modes: whispered, soft, neutral, loud and shouted. After corpus development, analysis is performed for i) sound intensity level, ii) duration and silence percentage, iii) frame energy distribution and iv) spectral tilt. The analysis shows vocal effort dependent traits which are used to investigate speaker recognition. Matched vocal mode conditions result in a closed-set speaker ID rate of 97.62%, with mismatch vocal conditions producing 54.02%. Finally, a speech mode classification system is developed, which has a range of classification rate from 44.5% to 98.5% confusing with adjacent vocal modes. These advancements can provide improved speech/speaker modeling information, as well as classified vocal mode knowledge to improve speech and language technology in real scenarios.
منابع مشابه
Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard or made public. Due to the profound differences between whispered and neutral speech in vocal excitation and vocal tract function, the performance of automatic speaker identi...
متن کاملCries and Whispers - Classification of Vocal Effort in Expressive Speech
The expansion of the video games industry raises innovative and challenging issues for speech technologies, e.g. the development of automatic content-based speech processing and speech recognition systems in the context of video games postproduction and voice casting. This paper presents a large-scale study on the classification of vocal effort in expressive speech for video games. Changes in v...
متن کاملEffects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context....
متن کاملA discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple indian languages
Whispered speech lacks the vocal chord vibration which is typically used to distinguish voiced and unvoiced consonants, making their discrimination a challenging task. In this work, we objectively and subjectively quantify the amount of discrimination between a voiced (V) consonant and its unvoiced (UV) counterpart using seven V-UV consonant pairs in six Indian languages, in neutral and whisper...
متن کاملA comprehensive vowel space for whispered speech.
Whispered speech is a relatively common form of communications, used primarily to selectively exclude or include potential listeners from hearing a spoken message. Despite the everyday nature of whispering, and its undoubted usefulness in vocal communications, whispers have received relatively little research effort to date, apart from some studies analyzing the main whispered vowels and some q...
متن کامل